Skip to main content

Supported Actions and Schemas

This section provides a comprehensive view of the Context API endpoint actions available in Knowledge Enrichment. Each action is documented with clear descriptions, usage examples, and the schemas to help developers understand how to interact with the API effectively. For complete technical details on the listed actions, see the accompanying reference sections:

Note: If an action is specified with an invalid file format, the request is rejected with validation errors.

For detailed technical information associated with each Context API action, see Technical Information.

Classify Image into Categories

The Image Classification action classifies the input image into pre-defined categories. The following are the requirements:

  • Input files must be images.
  • The classes array must contain at least 2 distinct non-empty entries. Note: If no classification actions (image-classification or text-classification) are specified, then the classes must be null or empty.

How it works:

  • Define at least 2 classification categories.
  • Analyze the image content using the AI model and determine the best matching category or class.
  • Return the name of the best-matching classification based on the analysis. For example, if you provide classes like "damaged_vehicle", "undamaged_vehicle", and "not_a_vehicle", the API might return: "damaged_vehicle"

Output

Classification result as a string

The output is a single string representing the class label based on the image content.

Schema: image-classification

AttributeTypeRequiredDescriptionExample
typestringYesDefines the classification category or label type."product-label"

Generate Image Description

The Image Description action analyzes an image and generates a textual description of its contents. The following are the requirements:

  • Input files must be images.
  • objectKeys must contain only image paths. Note: All objectKeys must be distinct and use valid formats, such as PNG, JPG, or PDF.

How it works:

  • The API uses AI models to identify objects, scenes, and activities in the image.
  • It synthesizes these elements into a coherent description.
  • The result is returned as a natural language text string.

For example: ""A blue Honda CR-V SUV with visible damage to the front bumper parked in a driveway."

Output

String containing the generated description

This output is a single string that provides a descriptive summary of the image content.

Schema: image-description

AttributeTypeRequiredDescriptionExample
typestringYesA natural language description of the image content."A package lying on a desk next to a laptop"

Generate Image Embeddings

The Image Embeddings action converts visual information into a high-dimensional vector representation. This action requires that its input files must be images.

How it works:

  • The image is processed through an AI model designed to extract visual features.
  • The network converts these features into a dense vector (typically 512-1024 dimensions).
  • These vectors place visually similar images closer in the vector space. The result is an array of floating-point numbers.

For example, [0.021, -0.065, 0.127, 0.036, -0.198, ... ]

These embeddings can be used for:

  • Finding visually similar images
  • Building image search systems
  • Clustering similar images together

Output

Vector of floats

The output is a vector representation of the image.

Schema: image-embeddings

FieldTypeDescriptionExample
outputList<float>High-dimensional vector representation of the image.[0.123, 0.456, 0.789]

Generate and Match Image Metadata

The Image Metadata Generation action generates metadata from an input image and matches them against a set of provided metadata examples to return the most relevant results. The following are the requirements:

  • Input files must be images.
  • objectKeys must contain only image paths. Note: All objectKeys must be distinct and use valid formats, such as PNG, JPG, or PDF.
  • kSimilarMetadata - must be provided and should contain at least one item. Note: If image-metadata-generation is not specified, then kSimilarMetadata must be null or empty.

Provide kSimilarMetadata in a POST request to enhance the quality of the API response.

tip

Providing multiple example objects is highly recommended, as it helps the API generate more accurate and contextually relevant results. Each item in kSimilarMetadata should be a dictionary (JSON object) containing representative example data.

For example, if the input image shows Times Square in New York City, your request might include metadata like:

{
"KSimilarMetadata" : [
{
"event:location": "New Bristol, Terranova",
"keywords:tags": "economy|markets|Noventis|GEF|Terranova|report",
"photo type": "Landscape photography, Nature photography, Macro photography",
"referenceL list:list" : [
"Getty Images: Times Square, New York City",
"Shutterstock Editorial: Times Square NYC",
"National Geographic Photo Archive: Times Square",
"New York Public Library Digital Collections: Times Square",
"Lonely Planet: Times Square Photo Guide"
],
"summary:text": "This report provides a comprehensive analysis of financial trends and investment opportunities across emerging markets in the Terranova region, focusing on the strategies employed by Noventis Group."
}
]
}

How it works:

  • You can provide example metadata structure to guide the generation.
  • The model analyzes the image and extracts relevant information.
  • It structures the information following your metadata templates. The result is a structured JSON object containing the metadata.

For example:

{
"car_metadata": {
"manufacturer": "Honda",
"model": "CR-V",
"color": "blue",
"damage_identified": {
"car_part": "bumper",
"damage_type": "cracked",
"damage_severity": "mild"
}
}
}

Output

Dictionary of generated metadata fields and values

The output is a key-value mapping where each key represents a metadata field (e.g., title, author, date), and each value contains the corresponding data extracted or generated for an image.

Schema: image-metadata-generation

FieldTypeDescriptionExample
colorstringThe color attribute of the item."red"
shapestringThe shape attribute of the item."rectangular"
barcodestringUnique identifier typically used for scanning."1234567890"

Detect Entities in Images

The Named Entity Recognition action detects specific entities visible in images, such as people, organization, and locations. This action requires that its input files must be images.

How it works:

  • The model analyzes the image to detect text and visual entities.
  • It categorizes detected entities into predefined types.
  • The result is a structured object containing entity types and values.

For example:

{
"organization": ["Honda", "CR-V"],
"person": ["None"],
"location": ["driveway"],
"object": ["car", "bumper"]
}

Output

Dictionary of detected entities

The output represents detected named entities extracted from image content.

Schema: named-entity-recognition-image

FieldTypeDescriptionExample
outputDictionary<string, List<string>>A mapping of entity types to a list of their detected values. Useful when multiple values can be associated with a single entity type.{ "Address": ["123 Main St", "New York"], "Date": ["2024-01-01"] }

Extract Entitites from Text

The Named Entity Recognition Text action identifies and categorizes named entities mentioned in text documents. This action requires that its input files must be a text (PDF) file.

How it works:

  • The model processes the text to identify entities like people, organizations, and locations.
  • It categorizes each entity into predefined types. The result is a structured object containing entity types and values.

For example:

{
"person": ["John Smith", "Jane Doe"],
"organization": ["Hyland Software", "HR Department"],
"date": ["2023-06-15", "January 1, 2024"],
"location": ["Westlake, OH"]
}

Output

The output reprsents detected named entities extracted from text content.

Schema: named-entity-recognition-text

FieldTypeDescriptionExample
NameList<string>A list of detected names associated with the entity.["John Doe"]
EmailList<string>A list of detected email addresses associated with the entity.["john.doe@example.com"]

Classify Text into Categories

The Text Classification action categorizes text into predefined categories. The following are its requirements:

  • Input files must be text.
  • Classes array must contain at least 2 distinct non-empty entries.

Note: If no classification actions (image-classification or text-classification) are specified, then the classes must be null or empty.

How it works:

  • You provide at least two classification classes.
  • The model analyzes the text content and determines the best matching class.
  • The result is the name of the matching classification.

For example, if you provide classes like "policy_document", "technical_manual", and "marketing_material", the API might return: "policy document"

Output

Classification result as a string

The output is a single string representing the predicted class label based on the text content.

Schema: text classification

Field NameTypeExampleDescription
Typestring"Invoice"The class label

Generate Text Embeddings from Documents

The Text Embeddings action converts text into numerical vector representations that capture semantic meaning. This action requires the input files to be in plain text format.

How it works:

  • The text is processed through language models that understand context and meaning.
  • The model generates vectors where semantically similar texts are closer together.
  • The result is an array of floating-point numbers (typically 768-1536 dimensions).

For example: [0.041, 0.082, -0.153, 0.027, 0.194, ... ]

Text embeddings enable:

  • Semantic search capabilities
  • Document similarity comparison
  • Content recommendation systems
  • Clustering similar documents

Output

Matrix (list of floats). The output is a vector representation of the text, typically per sentence.

Schema: text-embeddings

Field NameTypeExampleDescription
TypeList<List<float>[[0.12, 0.34, 0.56], [0.78, 0.90, 0.11]]A matrix where each element is a float.

Generate and Match Text Metadata

The Text Metadata Generation action creates structured metadata for text documents, particularly useful for PDFs and long-form content. The following are its requirements:

  • Input files must be PDF
  • objectKeys must contain only PDF paths.

Note: All objectKeys must be distinct and use valid formats, such as PNG, JPG, or PDF.

  • kSimilarMetadata - must be provided and contain at least one item.

Note: If text-metadata-generation is not specified, then kSimilarMetadata must be null or empty.

Provide kSimilarMetadata in a POST request to enhance the quality of the API response.

Tip: Providing multiple example objects is highly recommended as it helps the API generate more accurate and contextually relevant results. Each item in kSimilarMetadata should be a dictionary (JSON object) containing representative example data.

For example, if the input file is "Hyland Employee Handbook US Policies", your request might include metadata like:

 "KSimilarMetadata" : [
{
"document:title": "Emerging Markets Overview",
"document:date": "2024-11-15",
"entity:company": "Noventis Group",
"entity:organization": "Global Economic Forum",
"entity:person": "Alex R. Minden",
"event:location": "New Bristol, Terranova",
"keywords:tags": "economy|markets|Noventis|GEF|Terranova|report",
"document:type": "market analysis",
"document:category": "Economics & Finance",
"summary:text": "This report provides a comprehensive analysis of financial trends and investment opportunities across emerging markets in the Terranova region, focusing on the strategies employed by Noventis Group.",
"references:list": [
"Terranova Financial Bulletin, Vol. 12","GEF Annual Review 2023","Noventis Group Internal Strategy Memo"]
}
]

Output

The output is a key-value representation of predicted metadata for a PDF.

Schema: text-metadata-generation

Field NameTypeDescription
TypeDictionary of string to objectA key-value pair structure where keys are strings and values can be any data type.

Example:

{
"color": "red",
``` "shape": "rectangular",
"barcode": "1234567890"
}

Generate Text Summary

The Text Summary action condenses lengthy documents into summaries that capture key information. This action requires that the input files must be a text (PDF) file.

Optional maxWordCount specifies the maximum number of words allowed in the text summarization output. The following are its properties:

  • Default: 200
  • Constraints: Must be a number greater than 0.

Output

Summary string The output is a generated summary of the text.

Schema: text-summarization

Field NameTypeExampleDescription
Typestring"This document describes the shipping process for international packages."A textual summary or description represented as a string.